Projection predictive variable selection for Bayesian regularized SEM
Context: Regularized SEM, i.e., models with many parameters and a penalty function (frequentist) or shrinkage prior (Bayesian).
Goal: Providing a more formal approach to select parameters (and thus models) in Bayesian regularized SEM.
MIMIC model drawn with https://semdiag.psychstat.org
A shrinkage prior takes the role of the penalty:
\[ posterior \propto likelihood \times prior \]
Ideal shrinkage prior:
Many different shrinkage priors exist (see e.g., Van Erp, Oberski, and Mulder (2019)) and several have been applied in SEM (see Van Erp (2023) for an overview). Here, we use the regularized horseshoe prior (Piironen and Vehtari (2017a)).
In Bayesian regularized SEM, parameters are not automatically set to zero.
Goal: Finding a smaller submodel that predicts practically as good as the larger reference model.
See e.g., Piironen and Vehtari (2017b), Pavone et al. (2020), Piironen, Paasiniemi, and Vehtari (2020), or McLatchie et al. (2023)
Idea behind posterior projection: replace the reference model posterior \(p(\theta_*|D)\) with a simpler, restricted model \(q_\perp (\theta)\).
Implementation: minimize the KL divergence between the induced predictive distributions:
\[ \theta_\perp = \text{arg min KL }(p(\tilde{y }|\theta_*) || p(\tilde{y }|\theta)) \]
Practical implementation: instead of numerical optimization, for Gaussian linear models an analytical solution is available (Piironen, Paasiniemi, and Vehtari (2020)):
\[ \beta_{proj} = (X^T X)^{-1} X^T \mu_* \] where \(\mu_*\) denotes the predictions based on the reference model and \(X\) the design matrix of the restricted model.
Available in the R-package projpred (Piironen et al. (2023))
library(lavaan)
library(brms)
library(projpred)
mod <- 'F =~ y1 + y2 + y3 + y4 + y5
F ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10'
fit.lavaan <- sem(mod, data = df)
fs <- lavPredict(fit.lavaan, method = "Bartlett")
df$fs <- as.vector(fs)
refm_fit <- brm(fs ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10,
data = df,
prior = prior_hs)
refm_obj <- get_refmodel(refm_fit)
cvvs <- cv_varsel(
refm_obj,
cv_method = "kfold",
K = 10
)
plot(cvvs)Important caveat: these results are based on < 100 replications, without cross-validating the search part.
| Variable | # of levels | Values |
|---|---|---|
| \(N_{train}\) | 1 | 150 |
| \(p_{tot}\) | 4 | % of \(N_{train}\) = c(10%, 50%, 100%, 150%) |
| \(p_{rel}\) | 3 | \(p_{rel}\)/\(p_{tot}\) = c(1/3, 1/10, 1/100) |
| \(\rho_{AR}\) | 2 | c(0.25, 0.75) |
| \(\beta\) | 1 | 1/3 small, 1/3 medium, 1/3 large |
suggest_size heuristic in projpred can be influential.projpred seems to perform well especially in very sparse settings.
Feel free to reach out during this conference, or via e-mail: s.j.vanerp@uu.nl.
These slides are available online at: https://github.com/sara-vanerp/presentations
Sara van Erp